Bimodal speech recognition using coupled hidden Markov models

نویسندگان

  • Stephen M. Chu
  • Thomas S. Huang
چکیده

In this paper we present a bimodal speech recognition system in which the audio and visual modalities are modeled and integrated using coupled hidden Markov models (CHMMs). CHMMs are probabilistic inference graphs that have hidden Markov models as sub-graphs. Chains in the corresponding inference graph are coupled through matrices of conditional probabilities modeling temporal influences between their hidden state variables. The coupling probabilities are both cross chain and cross time. The later is essential for allowing temporal influences between chains, which is important in modeling bimodal speech. Our bimodal speech recognition system employs a two-chain CHMM, with one chain being associated with the acoustic observations, the other with the visual features. A deterministic approximation for maximum a posteriori (MAP) estimation is used to enable fast classification and parameter estimation. We evaluated the system on a speaker independent connected-digit task. Comparing with an acoustic-only ASR system trained using only the audio channel of the same database, the bimodal system consistently demonstrates improved noise robustness at all SNRs. We further compare the CHMM system reported in this paper with our earlier bimodal speech recognition system in which the two modalities are fused by concatenating the audio and visual features. The recognition results clearly show the advantages of the CHMM framework in the context of bimodal speech recognition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Medium vocabulary continuous audio-visual speech recognition

This paper presents our experiments on continuous audiovisual speech recognition. A number of bimodal systems using feature fusion or fusion within Hidden Markov Models are implemented. Experiments with different fusion techniques and their results are presented. Further the performance levels of the bimodal system and a unimodal speech recognizer under noisy conditions are compared.

متن کامل

Medium Vocabulary Continu Speech Recognit

This paper presents our experiments on continuous audiovisual speech recognition. A number of bimodal systems using feature fusion or fusion within Hidden Markov Models are implemented. Experiments with different fusion techniques and their results are presented. Further the performance levels of the bimodal system and a unimodal speech recognizer under noisy conditions are compared.

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000